Overfitting in Neural Nets: Backpropagation, Conjugate Gradient, and Early Stopping

نویسندگان

  • Rich Caruana
  • Steve Lawrence
  • C. Lee Giles
چکیده

The conventional wisdom is that backprop nets with excess hidden units generalize poorly. We show that nets with excess capacity generalize well when trained with backprop and early stopping. Experiments suggest two reasons for this: 1) Overfitting can vary significantly in different regions of the model. Excess capacity allows better fit to regions of high non-linearity, and backprop often avoids overfitting the regions of low non-linearity. 2) Regardless of size, nets learn task subcomponents in similar sequence. Big nets pass through stages similar to those learned by smaller nets. Early stopping can stop training the large net when it generalizes comparably to a smaller net. We also show that conjugate gradient can yield worse generalization because it overfits regions of low non-linearity when learning to fit regions of high non-linearity.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Preconditioned Stochastic Gradient Langevin Dynamics for Deep Neural Networks

Effective training of deep neural networks suffers from two main issues. The first is that the parameter spaces of these models exhibit pathological curvature. Recent methods address this problem by using adaptive preconditioning for Stochastic Gradient Descent (SGD). These methods improve convergence by adapting to the local geometry of parameter space. A second issue is overfitting, which is ...

متن کامل

Overfitting and Neural Networks: Conjugate Gradient and Backpropagation

Methods for controlling the bias/variance tradeoff typically assume that overfitting or overtraining is a global phenomenon. For multi-layer perceptron (MLP) neural networks, global parameters such as the training time (e.g. based on validation tests), network size, or the amount of weight decay are commonly used to control the bias/variance tradeoff. However, the degree of overfitting can vary...

متن کامل

Early Stopping as Nonparametric Variational Inference

We show that unconverged stochastic gradient descent can be interpreted as a procedure that samples from a nonparametric approximate posterior distribution. This distribution is implicitly defined by the transformation of an initial distribution by a sequence of optimization steps. By tracking the change in entropy over these distributions during optimization, we form a scalable, unbiased estim...

متن کامل

An Efficient Optimization Method for Extreme Learning Machine Using Artificial Bee Colony

Traditional learning algorithms with gradient descent based technique, such as back-propagation (BP) and its variant Levenberg-Marquardt (LM) have been widely used in the training of multilayer feedforward neural networks. The gradient descent based algorithm may converge usually slower than required time in training, since many iterative learning step are needed by such learning algorithm, and...

متن کامل

A Study on Neural Network Training Algorithm for Multiface Detection in Static Images

This paper reports the study results on neural network training algorithm of numerical optimization techniques multiface detection in static images. The training algorithms involved are scale gradient conjugate backpropagation, conjugate gradient backpropagation with Polak-Riebre updates, conjugate gradient backpropagation with Fletcher-Reeves updates, one secant backpropagation and resilent ba...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000